Method of preparing subject-specific immunogenic compositions based on a neo open-reading-frame peptide database

ABSTRACT

The present invention relates generally to the identification of tumor specific neo open-reading-frame peptides (NOPs) and the uses of these NOPs to produce cancer vaccines and the like. More in particular the invention relates to identifying at least one neoantigen in a patient and based thereupon preparing a subject-specific immunogenic composition. With the present invention it becomes possible to provide off-the-shelf cancer vaccines and the like within a short period of time and for potentially 30% of the total population of patients suffering from cancer.

FIELD OF THE INVENTION

The present invention relates generally to the identification of tumor specific neo open-reading-frame peptides (NOPs) and the uses of these NOPs to produce cancer vaccines and the like. More in particular the invention relates to identifying at least one neoantigen in a patient and based thereupon preparing a subject-specific immunogenic composition. With the present invention it becomes possible to provide off-the-shelf personalized cancer vaccines and the like within a short period of time and for a substantial percentage, up to 30%, of the total population of patients suffering from cancer.

BACKGROUND OF THE INVENTION

There are a number of different existing cancer therapies, including ablation techniques (e.g., surgical procedures and radiation) and chemical techniques (e.g., pharmaceutical agents, and antibodies), and various combinations of such techniques. Despite intensive research such therapies are still frequently associated with serious risk, adverse or toxic side effects, as well as varying efficacy.

There is a growing interest in cancer therapies that aim to target cancer cells with a patient's own immune system (cancer vaccines). Such therapies may indeed eliminate some of the herein-described disadvantages. Cancer vaccines or immunogenic compositions intended to treat an existing cancer by strengthening the body's natural defenses against the cancer and based on tumor-specific neoantigens indeed hold great promise as next-generation of personalized cancer immunotherapy. Evidence shows that such neoantigen-based vaccination can elicit T-cell responses and can cause tumor regression in patients.

Typically the immunogenic compositions/vaccines are composed of tumor antigens (antigenic peptides or nucleic acids encoding them) and may include immunostimulatory molecules like cytokines and that work together to induce antigen-specific cytotoxic T-cells that target and destroy tumor cells. Vaccines containing tumor-specific and patient-specific neoantigens requires sequencing of the patients' genome, as well as the production of personalized compositions comprising a combination of tumor-specific neoantigens present in that individual subject. Sequencing, identifying the specific neoantigens and preparing such personalized compositions may require a substantial amount of time, time which may unfortunately not be available to the patient, given that for some tumors the average survival time after diagnosis is around a year.

Accordingly, there is a need for improved methods and compositions for providing subject-specific immunogenic compositions/cancer vaccines.

In light of this, products, compositions, systems, methods and uses that provide for subject-specific immunogenic compositions/cancer vaccines and that would take away some of the herein-described disadvantages would be highly desirable, but are not yet readily available. In particular there is a clear need in the art for reliable, efficient and reproducible products, compositions, methods and uses that allow to identify and quickly supply subject-specific and cancer-specific neoantigens/tumor-specific mutant polypeptides. Accordingly, the technical problem underlying the present invention can be seen in the provision of such products, compositions, methods and uses for complying with any of the aforementioned needs.

The technical problem is solved by the embodiments characterized in the claims and herein below.

SUMMARY OF THE INVENTION

It is an aim of the present invention to provide for methods for identifying at least one neoantigen for preparing a subject-specific immunogenic composition, the at least one neoantigen comprising a tumor-specific neoepitope, wherein the neoantigen is a tumor-specific mutant polypeptide or part thereof encoded as result of a tumor-specific frameshift mutation in genes of the subject having cancer.

It is a further objective of the present invention to identify at least one neoantigen (or DNA or RNA encoding such neoantigen) in a tumor sample of a patient suffering from a cancer.

It is a further objective of the present invention to provide for a method of preparing a subject-specific immunogenic composition based on the neoantigens identified in the cancer of the patient.

It is a further objective of the present invention to provide for a method of preparing a subject-specific immunogenic composition within a short period of time.

It is a further objective of the present invention to provide for the possibility of preparing off-the-shelf and subject-specific immunogenic composition within a short period of time and for a substantial part of the population of patients suffering from cancer.

It is a further objective to be able to provide off-the-shelf subject-specific immunogenic compositions for at least 5%, 7.5%, 20% or even 30% of the population of patients suffering from a cancer.

It is a further objective of the present invention to provide for a method of treating cancer in a patient, in particular wherein the patient is provided with a therapeutic amount of neoantigens as identified with the methods as disclosed herein.

It is a further objective to provide for a method of treating cancer in a patient wherein the patient is provided with therapeutic amount of a plurality of cancer neoantigen peptides in combination with an anti-CTLA agent, an anti-PD-1 agent, an anti-PD-L1 agent, other stimulants of the immune system or a combination thereof.

It is a further objective to provide for a computerized system for identifying such neoantigens for preparing a subject-specific immunogenic composition.

This and other objectives by the methods, uses, systems and compositions as defined throughout the description and as defined in the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are further described hereinafter with reference to the accompanying drawings, in which:

FIG. 1: Example of TP53. Panel A shows the amino acid sequence of the p53 peptide, and below it the pattern of SNVs encountered in the corresponding genome position in the p53 gene in the TCGA dataset, plus underneath the black bars that represent frame shift mutations in the same dataset.

Panel b shows for the −1 and +1 frame the pNOPs (as described and defined in the text), black bars are stop triplets), and the NOPs encountered in patients in the dataset, with every line representing a different tumor in a different patient.

FIG. 2: Schematic overview.

REFERENCE TO A SEQUENCE LISTING

The Sequence listing, which is a part of the present disclosure, includes a text file comprising nucleotide and/or amino acid sequences of the present invention. The subject matter of the Sequence listing is incorporated herein by reference in its entirety. The information recorded in computer readable form is identical to the written sequence listing.

DEFINITIONS

The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.

A portion of this disclosure contains material that is subject to copyright protection (such as, but not limited to, diagrams, device photographs, or any other aspects of this submission for which copyright protection is or may be available in any jurisdiction.). The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure, as it appears in the Patent Office patent file or records, but otherwise reserves all copyright rights whatsoever.

Various terms relating to the methods, compositions, uses and other aspects of the present invention are used throughout the specification and claims. Such terms are to be given their ordinary meaning in the art to which the invention pertains, unless otherwise indicated. Other specifically defined terms are to be construed in a manner consistent with the definition provided herein. Although any methods and materials similar or equivalent to those described herein can be used in the practice for testing of the present invention, the preferred materials and methods are described herein.

For purposes of the present invention, the following terms are defined below.

The singular form terms “A,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a cell” includes a combination of two or more cells, and the like.

As used herein, the term “about,” when referring to a value or to an amount of mass, weight, time, volume, concentration or percentage is meant to encompass variations of in some embodiments ±20%, in some embodiments ±10%, in some embodiments ±5%, in some embodiments ±1%, in some embodiments ±0.5%, and in some embodiments ±0.1% from the specified amount, as such variations are appropriate to perform the disclosed method.

As used herein, ranges can be expressed as from “about” one particular value, and/or to “about” another particular value. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. It is also understood that each unit between two particular units are also disclosed. For example, if 10 and 15 are disclosed, then 11, 12, 13, and 14 are also disclosed.

The term “and/or” refers to a situation wherein one or more of the stated cases may occur, alone or in combination with at least one of the stated cases, up to with all of the stated cases.

As used herein, the term “at least” a particular value means that particular value or more. For example, “at least 2” is understood to be the same as “2 or more” i.e., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 . . . etc. As used herein, the term “at most” a particular value means that particular value or less. For example, “at most 5” is understood to be the same as “5 or less” i.e., 5, 4, 3 . . . −10, −11, etc.

The term “comprising” is construed as being inclusive and open ended, and not exclusive. Specifically, the term and variations thereof mean the specified features, steps or components are included. These terms are not to be interpreted to exclude the presence of other features, steps or components. It also encompasses the more limiting “to consist of”.

“Exemplary” means “serving as an example, instance, or illustration,” and should not be construed as excluding other configurations disclosed herein.

As used herein, administration or administering in the context of treatment or therapy of a subject is preferably in a “therapeutically effective amount”, this being sufficient to show benefit to the individual. The actual amount administered, and rate and time-course of administration, will depend on the nature and severity of the disease being treated. Prescription of treatment, e.g. decisions on dosage etc., is within the responsibility of general practitioners and other medical doctors, and typically takes account of the disorder to be treated, the condition of the individual patient, the site of delivery, the method of administration and other factors known to practitioners.

As used herein, “therapy” or “treatment” refers to treatment of a tumor with a therapeutic substance. A treatment may involve administration of more than one substance. A substance may be administered alone or in combination with other treatments, either simultaneously or sequentially dependent upon the condition to be treated. For example, the therapy may be a co-therapy involving administration of two agents, one or more of which may be intended to treat the tumor. The substances may be administered simultaneously, separately, or sequentially which may allow the agents to be present in the patient requiring treatment at the same time and thereby provide a combined therapeutic effect, which may be additive or synergistic. The therapy may be administered by one or more routes of administration, e.g. parenteral, intra-arterial injection or infusion, intravenous injection or infusion, intraperitoneal, intratumoral or oral. The therapy may be administered according to a treatment regime. The treatment regime may be a pre-determined timetable, plan, scheme or schedule of therapy administration which may be prepared by a physician or medical practitioner and may be tailored to suit the patient requiring treatment. The treatment regime may indicate one or more of: the type of therapy to administer to the patient; the dose of each drug; the time interval between administrations; the length of each treatment; the number and nature of any treatment holidays, if any etc. For a co-therapy a single treatment regime may be provided which indicates how each drug/agent is to be administered.

This term “cancer” refers to the physiological condition in mammals that is typically characterized by unregulated cell growth. The terms “cancer,” “neoplasm,” and “tumor,” are often used interchangeably to describe cells that have undergone a malignant transformation that makes them pathological to the host organism. Primary cancer cells can be distinguished from non-cancerous cells by techniques known to the skilled person. A cancer cell, as used herein, includes not only primary cancer cells, but also cancer cells derived from such primary cancer cell, including metastasized cancer cells, and cell lines derived from cancer cells. Examples include solid tumors and non-solid tumors or blood tumors. Examples of cancers include, without limitation, leukemia, lymphoma, sarcomas and carcinomas (e.g. colon cancer, pancreatic cancer, breast cancer, ovarian cancer, glioblastoma, prostate cancer, lung cancer, melanoma, lymphoma, non-Hodgkin lymphoma, colon cancer, (malignant) melanoma, thyroid cancer, papillary thyroid carcinoma, lung cancer, non-small cell lung carcinoma, and adenocarcinoma of lung.). As is well known, tumors may metastasize from a first locus to one or more other body tissues or sites. Reference to treatment for a “neoplasm”, “tumors” or “cancer” in a patient includes treatment of the primary cancer, and, where appropriate, treatment of metastases.

As used herein, the term “SNV” is a single nucleotide variant, where the sequence is different from the germline by one base changed into another; also known as a somatic variant.

As used herein the term “antigen” is a substance, preferably a (poly)peptide that induces an immune response.

As used herein the term “neoantigen” or “neoantigenic peptide” is an antigen that has at least one alteration that makes it distinct from the corresponding wild-type, parental antigen, e.g., via mutation in a tumor cell. A neoantigen can include a polypeptide sequence or a nucleotide sequence. The term “neoantigenic peptide” also encompasses a nucleotide sequence encoding such neoantigen peptide. A “tumor neoantigen” or “tumor-specific neoantigen” is a neoantigen present in a subject's tumor cell or tissue but not in the subject's corresponding normal cell or tissue. The neoantigen of the present invention are tumor-specific neoantigens.

As used herein the term “epitope” is the specific portion of an antigen typically bound by an antibody or T cell receptor. As used herein the term “neoepitope” is the specific portion of a neoantigen typically bound by an antibody or T cell receptor.

The term “peptide” is used herein interchangeably with “mutant peptide” and “neoantigenic peptide” to designate a series of residues, typically L-amino acids, connected one to the other, typically by peptide bonds between adjacent amino acids. Similarly, the term “polypeptide” is used interchangeably with “mutant polypeptide” and “neoantigenic polypeptide” in the present specification to designate a series of residues, typically L-amino acids, connected one to the other, typically by peptide bonds between the adjacent amino acids. The polypeptides or peptides can be a variety of lengths.

In certain embodiments the size of the at least one neoantigenic peptide molecule may comprise, but is not limited to, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 60, about 70, about 80, about 90, about 100, about 110, about 120 or greater amino molecule residues, and any range derivable therein. In specific embodiments the neoantigenic peptide molecules are equal to or less than 50 amino acids.

The neoantigens and polypeptides preferably do not induce an autoimmune response and/or invoke immunological tolerance when administered to a subject.

As used herein the term “ORF” means open reading frame. As used herein the term “neoORF” is a tumor-specific ORF arising from a mutation, in particular a frameshift mutation as described herein. A “frameshift mutation” is a mutation causing a change in the frame of the protein, for example as the consequence of an indel mutation as described herein.

Within the context of the current invention the mutation in the tumor cell that gives rise to the neoantigen is a frameshift mutation with a net change of sequence, compared to wildtype, that is not + or −3 nucleotides or a multiplicity thereof (6, 9, 12, 15 etc.). For example the frameshift consists + or −1, 2, 4, 5, 7, 8 . . . nucleotides. As will be understood by the skilled person, the frameshift mutation within the context of the current invention should not create a novel stop triplet on the spot. The frameshift within the context of the current invention gives rise to a neoORF, a novel open reading frame generated in the tumor by insertions, deletions or substitutions that bring in frame sequences encoding completely novel stretches of amino acids. The frameshift mutation within the context of the current invention is a mutation that occurs in the coding region of a gene; i.e. the region that encodes a protein. (Note that the new open reading frame can sometimes extend beyond the stop codon of the wild type gene).

As used herein the term “immunogenic” is the ability to elicit an immune response, e.g., via T cells, B cells, or both. As used herein, an immunogenic composition is a composition comprising substances, in particular neoantigen with the ability to elicit an immune response. Such composition may for example be a neoantigen-based vaccine based on one or more neoantigens, e.g., a plurality of neoantigens.

As used herein the term “sequence” can refer to a peptide sequence, DNA sequence or RNA sequence. The term “sequence” will be understood by the skilled person to mean either or any of these, and will be clear in the context provided. For example, when comparing sequences to identify a match, the comparison may be between DNA sequences, RNA sequences or peptide sequences, but also between DNA sequences and peptide sequences. In the latter case the skilled person is capable of first converting such DNA sequence or such peptide sequence into, respectively, a peptide sequence and a DNA sequence in order to make the comparison and to identify the match.

As used herein the term “exome” is a subset of the genome that codes for proteins. An exome can be the collective exons of a genome.

As used herein the term “transcriptome” is the set of all RNA molecules is a cell or population of cells. In a preferred embodiment the transcriptome refers to all mRNA.

As used herein the term “sample” can include a single cell or multiple cells or fragments of cells or an aliquot of body fluid, taken from a subject, by means including venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage sample, scraping, surgical incision, or intervention or other means known in the art.

As used herein the term “subject” encompasses a cell, tissue, or organism, human or non-human, whether in vivo, ex vivo, or in vitro, male or female. The term subject is inclusive of mammals including humans. Preferably the subject is a human subject diagnosed with cancer or suspected to have cancer.

As used herein the term “mammal” encompasses both humans and non-humans and includes but is not limited to humans, non-human primates, canines, felines, murines, bovines, equines, and porcines.

As used herein, we define a NeoORFeome as the set of all sequences in the human genome that are out of frame with known translated genes, but that as a result of a frame shift mutation can become in frame and encode a novel peptide of at least 8 or 10 amino acids in length before encountering a stop codon. The NeoORFeome is the complete space in which by single frame shift mutations novel peptides of significant length (here defined as 10 amino acids or longer) can be encoded and (potentially) expressed. In other words, the NeoORFeome comprises the complete set of neo Open Reading Frame in the human genome, defined as the sum of open reading frames that are not found in frame in the wild type human genome without mutation, but which by a single insertion/deletion/substitution can be made to be in frame, and then encode a peptide of at minimal length 8, 10 amino acids. The human NeoORFeome as here defined comprises 35,226,335 amino acids, approximately 35 million. This corresponds to approximately 105 Mb (Megabases) of encoding DNA. (The Human Genome is around 3000 Mb.)

We define herein peptides that are not encoded by the wild type human genome, but after frame shift mutation as defined herein, can be encoded by a tumor genome as a novel open reading frame peptide, or NOP. For any potential NOP in the NeoORFeome the C-terminal sequence is fixed (bounded by the encounter of a stop codon) and not dependent on the precise location of the frame shift mutation; the N-terminus, however, is defined by the mutation site, which is where potentially protein translation shifts into the novel frame. The most upstream novel sequence of a NOP is the most 5′ triplet in the wild type human genome of the Neo Open Reading Frame sequence which is not a stop triplet. We define the master-NOP, also referred to as the pNOP, as the peptide encoded by the longest possible sequence, so from the most upstream triplets as described to the stop triplet at the 3′ end. Sequences of such master-NOPs are represented in the database/library as defined herein.

Indeed the library of pNOPs is defined as (part of) the subset of the Neo-Orfeome which we found to be the most frequently switched on by frame shift mutation in a very large set of tumor sequence data; it is thus a listing of master NOPs or pNOPs. The listing is for practical purposes categorized in ‘most found’, ‘often found’ and ‘frequently found’, whereby the largest set of pNOPs (frequently found) contains the category often found, which in turn includes the category most found. The complete library contains pNOPs that are encountered in over 30% of all cancers as described in the TCGA database. Based on our analysis for any new tumor of which the genome (or transcriptome or exome or ORFeome—which is also included in any of the embodiments described below referring to genome, exome or transcriptome) is sequenced, the chance is over 30% that it will encode a NOP that is listed in our library as described here. In other words: the library can potentially provide to over 30% of all cancer patients.

DETAILED DESCRIPTION

NOP sequences (also known as neo Open Reading Frames, neoORFs) have been previously described as potential cancer vaccines. See, for example, WO95/32731, WO2016172722 (Nantomics), WO2016/187508 (Broad), WO2017/173321 (Neon Therapeutics), US2018340944 (University of Connecticut), and WO2019/012082 (Nouscom), as well as Rahma et al. (Journal of Translational Medicine 2010 8:8) which describes peptides resulting from frameshift mutations in the von Hippel-Lindau tumor suppressor gene (VHL) and Rajasagi et al. (Blood 2014 124(3):453-462) which reports the systematic identification of personal tumor specific neoantigens.

Surprisingly, the present disclosure provides NOP sequences that are shared among cancer patients. The finding of shared NOP sequences is used for a methodological innovation that defines a library of NOP sequences and a strategy to match one or more NOPs encoded by any tumor genome sequence with the library of NOP sequences, to select preferred sequences for design of a vaccine for treatment of said tumor. The library of NOP sequences disclosed herein and the recognition that such NOP sequences are shared among cancer patients was not previously described in the art.

It is contemplated that any method, use or composition described herein can be implemented with respect to any other method, use or composition described herein. Embodiments discussed in the context of methods, use and/or compositions of the invention may be employed with respect to any other method, use or composition described herein. Thus, an embodiment pertaining to one method, use or composition may be applied to other methods, uses and compositions of the invention as well.

As embodied and broadly described herein, the present invention is directed to the surprising finding that it is possible to provide a library of limited size that comprises (sequences of) neo open reading frame peptides that are found in tumor material of patients as the consequence of frameshift mutations that lead to a new open reading frame with a novel, common, tumor- specific protein sequence towards the C-terminal end. By comparing sequence information from a tumor sample of a patient with the library it has now become possible to quickly identify whether there is a match between a sequence identified in the patients material with a sequence in the database. A match is identified when a sequence identified in the patients material and a sequence from the database have a string, i.e. a peptide sequence (or RNA or DNA sequence encoding such peptide (sequence) in case the comparison is on the level of RNA or DNA) in common representative of at least 8, preferably at least 10 adjacent amino acids. The thus identified tumor-specific mutant polypeptide encoded by a tumor-specific frameshift mutation in (expressed) genes of the subject having cancer can be used to provide for neoantigens comprising a tumor-specific neoepitope. With the limited libraries as provided herein, and based on the actual amount of sequences in the database (as described herein elsewhere) it is estimated that between about 5-30% of the population of patients having cancer can be provided with a subject-specific and tumor-specific immunogenic composition comprising one or more neoantigens based on one or more matches between sequence identified in the patients material and a sequence from the database.

In some more detail, it was realized by the inventor of the present invention that with the human genome being about 3×10⁹ base pairs, about 1.5% of which is coding for protein, the number of possible point-mutations (nucleotide changes or SNVs) is virtually infinite, especially since each position can mutate into three others, and of course endless other rearrangements and indels are possible. Therefore the number of possible neoantigens that arise in tumors is also huge.

In the present invention a specific window of cancer mutations is described. This window is derived from the reference human genome sequence. While the 3×10⁹ base pairs can mutate in infinite ways, there is only a limited repertoire of possible neoantigens dictated by the coding (and expressed) part of the human genome sequence. The ORFeome (the complete set of open reading frames (ORFs) in a genome), as it has been referred to, is ‘meant’ to be read in the proper reading frame. However, there are two other frames of each gene, the −1 and +1. These alternative frames do not necessarily encode relevant peptides, since they may run into a stop triplet fast. The present inventors have now defined that part of the genome that encodes peptides resulting from out of frame translation, and that are at least the size of a potential epitope when it is seen as a neoantigen. These peptides are referred to as the neo open reading frame peptides, or NOPs. The maximal coding region for each of these NOPs (which we refer to as the master NOP or pNOP, for potential NOP) begins immediately downstream of a stop triplet in the reference human genome sequence, contains then at least ten amino acid-encoding triplets, and finishes with a stop.

Thus each gene as defined in the reference genome sequence includes a set of pNOPs. These NOPs are commonly not expressed in the human body, and if they were they would therefore be seen by the immune system as entirely foreign. Since, other than SNV-neoantigens, they are not a small change in a known peptide chain, but a longer stretch of foreign amino acid sequence, it is a priori to be expected that these NOPs are seen by the immune system on average as much more foreign and antigenic than SNV-neoantigens.

But how can these NOPs (or parts thereof) be expressed in a tumor of a patient? In the present invention we limit ourselves to simple insertions and deletions in coding regions, which—in order to cause a frame shift—could be of any length, but should not have a length that is 3 nucleotides or a multiple of 3 nucleotides, and should not create a novel stop triplet on the spot. Again, the set of such frame shift causing mutations is, like the set of SNV-causing mutations, virtually infinite: at every position in the 1.5% coding region of the genome almost any insertion or deletion (or net result from insertion plus deletion) of net change of sequence of + or −1, 2, 4, 5, 7, 8 etc. nucleotides could bring a NOP in frame.

We have assumed and tested that in the first approximation frame shift mutations (like SNVs) can occur at any position in the genome and the basic assumption is that the sites and + or − nature of frame shift mutations are more or less randomly distributed, or in other words that within a gene every nucleotide is a potential position for a frame shift mutation to occur, and thus a new NOP to be put into frame. We tested that assumption by extracting from the The Cancer Tumor Genome Atlas (TOGA), which consists of 11,300 whole genome sequenced human tumors covering 34 major tumor types, all frame shift mutations in just one gene. FIG. 1 panel a shows all the point mutations in that gene, and indeed these seem reasonably equally distributed over the gene. FIG. 1 panel b shows the pNOPs as defined above: all the potential NOPs that could in theory arise as result of a frame shift. FIG. 1 panel b also shows the results: the frame shift causing insertions/deletions are shown in small black letter print, followed by the patient specific NOP. The figure clearly shows several findings: 1. the assumption that frame shifts are a priori spread continuously over the gene stands (for this gene) confirmed: there is an almost continuous and straight diagonal line through the sites where the NOP is put into frame. 2. Frame shifts are (like SNVs) almost all different. It is rare that among 11,300 different tumors from different patients we find the same SNV or the same frame shift. 3. Most importantly: in spite of the fact that almost all of these frame shift mutations are different and occur at a different precise position, they all cause a (portion of) the same NOP to go into frame, and certainly all cause the identical C-terminus of that NOP. 4. The size of the TOGA is such that the pattern of frame shift mutations is almost saturated in the sense that many of the positions that can mutate to turn on a NOP are likely to have been hit in this dataset.

The conclusion of the analysis of the gene in FIG. 1 caused the inventor to realize that it may not be needed to further discover NOPs by analysis of patients, because the potential of these NOPs is defined by the description of the (entire) set of pNOPs as disclosed herein as defining (part of) the library.

In other words a library can be provided that comprises sequences (DNA, RNA or peptide) that correlates to and comprises the sequence of potential neo open-reading-frame peptides that can occur in a patient as the consequence of frameshift mutations as defined herein and occurring in a gene in the patient. Depending on where such frameshift mutation occurs in a gene, the length of an individual's neo open-reading frame peptide can vary between individuals, however the sequences provided in the library or database dose comprise such varying lengths of individual neo open reading frames and may therefore also be referred to as a master NOP (see FIGS. 1 and 2). In other words the master NOP sequence provided in the library/database comprises the NOP sequences part of which (or the entirety) may occur in individual patients as the consequence of a frame shift mutation as defined herein. By comparing sequence information from a new patient, a match between a sequence in the tumor material of the patient and such master NOP in the database may be recognized, thus allowing to identify a neoantigen that is specific for that subject and specific for that tumor. Based thereupon, using methodology well-known to the skilled person it is possible to provide a vaccine based on said identified neoantigen (or more than one neoantigen). The library is of such extent that, depending on the inclusion of more or less of the sequences with SEQ ID NO 1-1989 disclosed herein (or of course nucleotide sequences encodes such peptide sequences) more than 5, 7.5, 10, 20, even up to 30% or more of patients suffering from cancer are expected to have at least one, or more, neoantigen in the tumor and that can be detected by comparing to the library/database as provided herein.

The results that are shown below confirm the conclusion above that the chance of any NOP to arise by a frame shift at a certain position seems in first approximation random. But whether these mutations will be represented in the tumor of a patient is far from random. The TOGA, which, given its size and scope, provides a high resolution picture of all genetic events that will ever be found in the tumor of a patient, shows that for some genes there is a really dense pattern of NOPs, while others seem to be represented very rarely.

It was further realized by the present inventor that for SNVs as a rule every SNV in a gene in a tumor is different, and thus every neoantigen is different. For the pNOPs as described herein, the data shows that within that pNOP virtually every frame shift is different, but every NOP is the same, at least at its C-terminus. Thus within each NOP the genome acts as a funnel: it will absorb frame shift mutations, that occur at different sites and are of different sequence, but they will trigger expression of the same NOP, at least for the C-terminus part of it.

It has been observed before that some genes are often hit in cancer because they are human cancer driver genes. We screened the TOGA for our set of pNOPs and asked: how often is each of them represented in the set of tumors. Table 1 shows the result. An impressive list of all usual suspects shows up if we rank the NOPs that are actually found in human cancers.

TABLE1 Most often encountered genes #gene symbol #H: val count TP53 308 ARID1A 206 TTN 201 KMT2D 171 MUC16 118 APC 118 PTEN 95 KMT2C 84 GATA3 82 ATRX 80 NF1 79 CDKN2A 77 FAT1 70 ZFHX3 70 KMT2B 69 PBRM1 69 CIC 65 PCLO 61 VHL 58 MAP3K1 57 ZFHX4 57 KDM6A 54 RB1 54 NOTCH1 52 ARID2 51 CDH1 50 MUC5B 49 PIK3R1 49 OBSCN 48 CSMD3 48 USH2A 47 NEB 45 SOX9 44 SYNE1 44 BRCA2 42 SETD2 42 TG 42 ARHGAP35 41 HMCN1 41 BAP1 41 LRP1B 41 MGA 40 ZFP36L1 40 CUBN 40 CCDC168 39 SPEN 38 FAT2 38 FAT3 37 SYNE2 37 ELF3 37 MBD6 37 CREBBP 37 BCOR 36 ZNF292 36 AHNAK2 35 KMT2A 35 APOB 35 KIAA1217 35 CDK12 34 ATM 34

It is to be noted that the tumors in the TCGA are of different people, with different disease (one will be a Caucasian with a glioblastoma, the other of Japanese descent with a colon cancer,) but they have one thing in common: they have cancer. That means that with the funneling effect described above a library of limited size containing NOPs and that applies to many different tumors in different people can be provided.

In summary, the present invention is based on the surprising finding of a library of limited size of potential neo open reading frame peptides (pNOPs) that contain (peptide) sequences that have, by a frame shift mutation, corresponding peptides in up to and over 30% of all cancer patients.

The relevance of the present invention is there for twofold:

-   -   Independent of the precise type of delivery (DNA, RNA,         recombinant organism, peptide, recombinant protein) there is the         potential to generate a library of antigenic peptides (or DNA,         RNA, etc. thereof) in advance, and store it, such that after         genome sequencing of a sample from a subject with cancer the         relevant reagents can be provided off the shelf. With some tumor         types showing a patient's expected life span after diagnosis of         less than a year (e.g. pancreas of glioblastoma) this may save         critical months before treatment onset. It will also create         great advantages of scale that will eventually reduce cost.     -   Since the library of vaccines is limited, one may keep track of         the precise peptide or mixture of peptides that works best.

Generally speaking and in one embodiment, the workflow for providing an antigenic peptide for use in an immunogenic composition is as follows. When a patient is diagnosed with a cancer for example a biopsy may be taken from the tumor, or a sample set is taken of the tumor after resection. The genome, exome or transcriptome is sequenced by existing methods. The outcome is compared, for example using a web interface or software, to (part of) the pNOP set in our database. This will identify and display hits. In turn a patient and/or physician can, if they desire, be informed whether or not hits have been found. On average this is expected for up to 30% of the cases. If matches are found, the order can be given to provide a vaccine corresponding to the personal NOPs of the patient. This is fully personalized vaccination, since no vaccine will be provided that is not expected to be of potential benefit. If patients have in their tumors NOPs that are significantly longer than 20 amino acids, it is preferred that multiple peptides of overlapping sequence are provided. If the only NOPs are relatively short one may choose to incorporate these potential neoantigens in a longer peptide sequence, by adding additional sequences that flank it (to ensure proper processing via the proteasome). The limited size of the library makes it more practical to synthesize, purify, check and store and supply under the relevant regulatory terms.

In its broadest sense there is provided for a method of identifying at least one neoantigen in a cancer patient by identifying a match between a neo open reading frame peptide present in the tumor of the patient and a sequence in a database of sequences (DNA. RNA or peptide) representing the master neo open reading frame peptides as defined herein. The thus identified neoantigen can be used to prepare an immunogenic composition useful in the treatment of the patient.

More in particular there is provided for a method of identifying at least one neoantigen for preparing a subject-specific immunogenic composition the method comprising:

-   -   identifying a match between at least one sequence or portion         thereof in a set of sequences obtained from a subject-specific         tumor genome, exome or transcriptome by complete, targeted or         partial genome, exome, or transcriptome sequencing     -   by comparing the at least one sequence or portion thereof with a         database of nucleotide and/or polypeptide sequences, wherein the         sequences in the database are represented by at least 147         sequences selected from SEQ. ID. No. 1-1989, preferably by at         least SEQ ID.NO 1-147,     -   wherein a match is identified when the at least one sequence or         portion thereof and a sequence from the database have a string         in common representative of at least 8, preferably at least 10         adjacent amino acids,     -   to thereby identify a tumor-specific neoantigen encoded by a         tumor-specific frameshift mutation in genes of the subject         having cancer,     -   and optionally, repeating the method for a further sequence or         portion thereof from the set of sequences of the         subject-specific tumor genome, exome or transcriptome.

In another embodiment there is provided for a method of identifying at least one neoantigen for preparing a subject-specific immunogenic composition the method comprising:

-   -   a. performing complete, targeted or partial genome, exome, or         transcriptome sequencing of at least one tumor sample obtained         from a subject having cancer to obtain a set of sequences of the         subject-specific tumor genome, exome or transcriptome;     -   b. comparing at least one sequence or portion thereof from the         set of sequences with a database of nucleotide and/or         polypeptide sequences, wherein the sequences in the database are         represented by at least 147 sequences selected from SEQ. ID. No.         1-1989, preferably by at least SEQ ID.NO 1-147;     -   c. identifying a match between the at least one sequence or         portion thereof from the set of sequences and a sequence in the         database when they have a string in common representative of at         least 8, preferably at least 10 amino acids to thereby identify         a tumor-specific neoantigen encoded by a tumor-specific         frameshift mutation in genes of the subject having cancer; and         optionally     -   d. repeating steps b.-c. for a further sequence or portion         thereof, from the set of sequences of the subject-specific tumor         genome, exome or transcriptome.

In the method, a match is identified by comparing between at least one sequence or portion thereof in a set of sequences obtained from a subject-specific tumor genome, exome or transcriptome by complete, targeted or partial genome, exome, or transcriptome sequencing.

The set of sequences are preferably obtained by taking a sample from a tumor of the patient. The skilled person knowns how to obtain samples from a tumor of a patient and depending on the nature, for example location or size, of the tumor. Preferably the tumor is a solid tumor. Preferably the sample is obtained from the patient by biopsy or resection. The sample is obtained in such manner that is allows for sequencing of the genetic material obtained therein. In order to prevent a less accurate identification of at least one antigen, preferably the sequence of the tumor sample obtained from the patient is compared to the sequence of other non-tumor tissue of the patient, usually blood, obtained by known techniques (e.g. venipuncture).

Sequencing of the genome, exome or transcriptome may be complete, targeted or partial. In some embodiments the sequencing is complete (whole sequencing). In some embodiments the sequencing is targeted. With targeted sequencing is meant that purposively certain region or portion of the genome, exome or transcriptome are sequenced. For example targeted sequencing may be directed to only sequencing for sequences in the set of sequences obtained from the cancer patient that would provide for a match with one or more of the sequences in the database, for example by using specific primers. In some embodiment only portion of the genome, exome or transcriptome is sequenced. Otherwise stated, targeted sequencing may be performed in order to determine whether there is a match with each of SEQ ID.NO 1-147, preferably to determine whether there is a match with each of SEQ ID Nos 1-1989. The skilled person is well-aware of methods that allow for whole, targeted or partial sequencing of the genome, exome or transcriptome of a tumor sample of a patient. For example any suitable sequencing-by-synthesis platform can be used including the Genome Sequencers from Roche/454 Life Sciences, the 1G Analyzer from Illumina/Solexa, the SOLiD system from Applied BioSystems, and the Heliscope system from Helicos Biosciences. The method of sequencing the genome, exome or transcriptome is not in particular limited within the context of the present invention.

In some preferred embodiments the genome is sequenced. In some preferred embodiments the exome is sequenced. In some preferred embodiments the transcriptome is sequenced. Preferably the transcriptome is sequenced, in particular the mRNA present in a sample from a tumor of the patient. The transcriptome is representative of genes and neo open reading frame peptides as defined herein being expressed in the tumor in the patient.

Following sequencing of the tumor, using any sequencing method known in the art, the tumor sequences are aligned and compared to a reference genome. Sequence comparison can be performed by any suitable means available to the skilled person. Indeed the skilled person is well equipped with methods to perform such comparison, for example using software tools like BLAST and the like, or specific software to align short or long sequence reads, accurate or noisy sequence reads to a reference genome, e.g. the human reference genome GRCh37 or GRCh38. A match is identified when a sequence identified in the patients material and a sequence as disclosed herein have a string, i.e. a peptide sequence (or RNA or DNA sequence encoding such peptide (sequence) in case the comparison is on the level of RNA or DNA) in common representative of at least 8, preferably at least 10 adjacent amino acids. Furthermore, sequence reads derived from a patients cancer genome (or transcriptome) can partially match the genomic DNA sequences encoding the amino acid sequences as disclosed herein, for example if such sequence reads are derived from exon/intron boundaries or exon/exon junctions, or if part of the sequence aligns upstream (to the 5′ end of the gene) of the position of a frameshift mutation. Analysis of sequence reads and identification of frameshift mutations and their protein products will occur through standard methods in the field. For sequence alignment, aligners specific for short or long reads can be used, e.g. BWA (Li and Durbin, Bioinformatics. 2009 Jul. 15;25(14):1754-60) or Minimap2 (Li, Bioinformatics. 2018 Sep. 15;34(18):3094-3100). Subsequently, frameshift mutations can be derived from the read alignments and their comparison to a reference genome sequence (e.g. the human reference genome GRCh37) using variant calling tools, for example Genome Analysis ToolKit (GATK), and the like (McKenna et al. Genome Res. 2010 September;20(9):1297-303). The out-of-frame protein products (NOPs) resulting from frameshift mutations can be identified following the genetic triplet code known in the field and a database of reference sequences as publicly available through e.g. Ensembl, UCSC, NCBI or other sequence resources.

The match is identified by comparing at least one sequence or portion obtained from sequences in the genome, exome or transcriptome of the subject having cancer with a database of nucleotide and/or polypeptide sequences as defined herein, wherein the sequences in the database are represented by at least 147 sequences selected from SEQ. ID. No. 1-1989, preferably by at least SEQ ID.NO 1-147.

As used herein the term “the sequences in the database are represented by at least xxx sequences” indicates that the database may comprise of said xxx sequences or part of a sequence (e.g. a sequence of at least 8 or 10 adjacent amino acids present in a sequence comprised in the set of xxx sequences) or may comprise of DNA or RNA sequences encoding for such sequences or part of said sequences (e.g. a DNA or RNA sequence encoding a sequence of at least 8 or 10 adjacent amino acids present is said sequence comprised in the set of xxx sequences representing the database). In other words, the database is represented by DNA, RNA or peptide sequences that represent, encode or define (part of) the master NOPS as disclosed herein.

In some embodiment the database is represented by at least 1, 20, 50, 100, 140, 147, 200, 250, 500, 800, 960, 1000, 1200, 1500, 1750 or 1989 sequences selected from SEQ. ID. No. 1-1989, preferably by at least SEQ. ID. No. 1-20, SEQ. ID. No. 1-50, SEQ. ID. No. 1-100, SEQ. ID. No. 1-140, SEQ. ID. No. 1-147, SEQ. ID. No. 1-200, SEQ. ID. No. 1-250, SEQ. ID. No. 1-500, SEQ. ID. No. 1-800, SEQ. ID. No. 1-960, SEQ. ID. No. 1-1000, SEQ. ID. No. 1-1200, SEQ. ID. No. 1-1500, SEQ. ID. No. 1-1750, SEQ. ID. No. 1-1989. It was found the more sequences are selected from the SEQ. ID. No. 1-1989, the more patients will be identified to have at least one neo open reading frame peptide or neoantigen is a tumor. The sequences in the database are sorted by the frequency they match with an individual's neo open reading frame or neoantigen in the population of cancer patients studied.

Comparing of at least one sequence or portion thereof (i.e. part of the at least one sequence, preferably wherein the part is representative of at least 8 or 10 amino acids) from the set of sequences and a (DNA, RNA or peptide) sequence in the database can be done by any suitable mean available to the skilled person. Indeed the skilled person is well equipped with method to perform such comparison, for example using software tools like BLAST and the like.

A match between the at least one sequence or portion thereof from the set of sequences and a (DNA, RNA or peptide) sequence in the database is identified when they have a string in common representative of at least 8, preferably at least 10 amino acids to thereby identify a tumor-specific (mutant) neoantigen encoded by a tumor-specific frameshift mutation in genes of the subject having cancer. In other words, a match is identified when the sequence from the patient and a sequence in the database share, i.e. both comprise, at least 8, preferably at least 10 adjacent amino acids (or an equivalent number of nucleotides when DNA or RNA is compared). In that respect it is relevant to note that a match is also identified when the DNA or RNA sequences compared are different but both encode for the same stretch of at least 8, preferably at least 10 amino acids. The methods may also comprise determining the presence or absence in a tumor of each NOP represented in the libraries disclosed herein.

In a preferred embodiment a substantial part, preferably all of the sequences from the set of sequences obtained by sequencing the genome, exome or transcriptome are compared with the database.

Since the database is comprised of sequences comprising/representing neo open reading frame peptides that were identified in patients and appear as the consequence of a frame shift mutation as disclosed herein, a match with a sequence obtained from a tumor of the patient indicates that said patient comprises a frame shift mutation that leads to the appearance of a neo open reading frame encoding for a neo open reading frame peptide/neoantigen having a sequence that is comprised in at least one sequence in the database.

The invention thus also provides for a method of identifying a frame shift mutation as defined herein in a tumor in a patient and/or identifying of a neo open reading frame encoding for neo open reading frame peptide or neoantigen present in the tumor of the patient. With the method such neo open reading frame peptide(s) or neoantigen(s) may be detected in the patient.

As will be understood such identified neo open reading frame peptide or neoantigen may suitable be used for preparing an immunogenic composition, for example a vaccine, using any suitable method or system available to the skilled person.

Once a match has been found between a patients' tumor sequence and a sequence in the database, i.e. comprising the master-neo open reading frame peptide (NOP), an immunogenic composition, for example a vaccine will be made available to immunize the patient against his/her tumor. In general there are different, non-limiting, ways in which the actual vaccine can be provided:

-   -   1. Synthetic peptides. Depending on the length of the neo open         reading frame peptide or neoantigen the actual vaccine can be a         single peptide (e.g. of 20 amino acids), or a series of tiled         peptides (e.g. if the NOP is 80 amino acids long, the vaccine         can be the sum of 7 peptides of 20 amino acids with each time 10         amino acids overlap).     -   2. Nucleic acids encoding the neo open reading frame peptide or         neoantigen. This can be an appropriately packaged or formulated         RNA or DNA preparation that can contain the sequence encoding         the NOP or a part of it. For such a construct no tiling is         necessary, since an open reading frame of tens to hundreds of         triplets can easily be generated.     -   3. A recombinant organism (a virus, or e.g. a genetically         modified Listeria bacterium as used in other contexts), in this         case expressing the (or part of the) (master) NOP may be used.     -   4. In a fourth application the immune system of the patient is         not vaccinated but boosted in another way, based on the hit         between the patient's tumor sequence and the NOP-library: this         involves, for example, identifying a mouse or human T-cell which         recognizes the antigen and which can be engineered into any         CAR-T construct and used to direct a T-cell line against that         neo open reading frame peptide or neo antigen or part thereof.

In some embodiments, the vaccine comprises a pharmaceutically acceptable excipient and/or an adjuvant. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions, such as pH adjusting and buffering agents, tonicity adjusting agents, wetting agents and the like. Suitable adjuvants are well-known in the art and include but are not limited to, aluminum (or a salt thereof, e.g., aluminium phosphate and aluminium hydroxide), monophosphoryl lipid A, squalene (e.g., MF59), montanide, hiltonol, poly-ICLC (polyriboinosinic-polyribocytidylic acid-polylysine carboxymethylcellulose), liposomes (e.g. CAF09, cationic adjuvant formulation 09), Amplivant, Resiquimod, Iscomatrix and cytosine phosphoguanine (CpG). A skilled person is able to determine the appropriate adjuvant, if necessary, and an immune-effective amount thereof. As used herein, an immune-effective amount of adjuvant refers to the amount needed to increase the vaccine's immunogenicity in order to achieve the desired effect.

In some embodiments, the vaccine can be used as treatment of a tumor, wherein the genome sequence of said tumor is sequenced, NOP sequences are defined based on the genome sequence and a vaccine is prepared based on the NOP sequence for administration to a patient with cancer. The vaccine may be administered during any time after diagnosis of the cancer.

The disclosure further provides the use of neoantigens and vaccines disclosed herein in prophylactic methods for preventing or delaying the onset of cancer. The vaccine may be specifically used in a prophylactic setting for individuals that have an increased risk of developing cancer. For example, prophylactic vaccination is expected to provide possible protection for up to 50% of all cancers. In some embodiments, the prophylactic vaccination methods are used for individuals who have a genetic cancer predisposition mutation. In some embodiments, the prophylactic methods are useful for individuals who are genetically related to individuals afflicted with cancer. In some embodiments, the prophylactic methods are useful for the general population.

In some embodiment the database (or library) is identified by at least 147 sequences selected from SEQ. ID. No. 1-1989, preferably wherein the sequences in the database are identified by at least SEQ ID.NO 1-147. It was found that with a database of that size about 5-10% of the population of cancer patients is expected to match with at least one or more of the sequences in the database.

In some embodiment the database (or library) is identified by at least 960 sequences selected from SEQ. ID. No. 1-1989, preferably wherein the sequences in the database are identified by at least SEQ ID.NO 1-960. It was found that with a database of that size about 17-23% of the population of cancer patients is expected to match with at least one or more of the sequences in the database.

In some embodiment the database (or library) is identified by at least 1989 sequences selected from SEQ. ID. No. 1-1989, preferably wherein the sequences in the database are identified by at least SEQ ID.NO 1-1989. It was found that with a database of that size about 27-33% of the population of cancer patients is expected to match with at least one or more of the sequences in the database.

It will be understood by the skilled person that the database may vary in size (number of sequences selected from SEQ. ID. No. 1-1989) without leaving the inventive gist of the present invention. However, in a preferred embodiment the database should be of such size that at least 5%, 6%, 7% . . . 10%, 11% . . . 15%,16% . . . 19%,20%, . . . 24%,25%, . . . 28%, 29% or more of the population of cancer patient (is expected to) match with at least one or more of the sequences in the database.

Also provided is for a method as described herein wherein a plurality of neoantigens for preparing a subject-specific immunogenic composition are identified. As described in the example, it was found that more than one neo open reading frame peptide may be recognized in in the tumor material of the patient using the method described herein. Therefore, in some embodiments 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more neo open reading frame peptides or neoantigen may be identified in the material obtained from a tumor of the patient. The one or plurality or neo open reading frame peptides or neoantigens (or DNA or RNA encoding such antigens) may be used in preparing the immunogenic composition of the present invention.

It will be understood by the skilled person that the neo open reading frame peptides or antigens identified may be a varying length, for example may be about 8-10 amino acids in length, greater than 10 amino acids in length, greater than 15 amino acids in length, greater than 30 amino acids in length, or greater than 50 amino acids in length. It will also be understood by the skilled person that the immunogenic composition (e.g. vaccine) may comprise peptides (or DNA or RNA encoding such peptides) that are based on the neo open reading frame peptides or neoantigen identified in the tumor of the patient may be of the same length as the neo open reading frame peptides or neoantigen identified in the tumor or may comprise only part of said neo open reading frame peptides or neoantigen identified in the tumor. Preferably the peptide used in the immunogenic composition is at least 8-10 amino acids in length (or, in case of DNA or RNA encoding such peptide, an equivalent number of nucleotides (e.g. 24-30 nucleotides)). For example in case the neo open reading frame peptides or neoantigen identified in the tumor has a length of 20 amino acids, the peptide to be used in the immunogenic composition may be 8-20 amino acids in length (preferably 10-20 amino acids in length). The skilled person will also understand that more than one peptide may be included in the immunogenic composition based on the neo open reading frame peptide or neoantigen identified. For example in case the neo open reading frame peptide or neoantigen identified in the patient has a length of 80 amino acids, more than one peptide based on said neo open reading frame peptide may be included in the immunogenic composition. For example a first peptide of, for example 10-40 amino acids may be included and a second, different peptide comprising 10-40 amino acids may be included. The peptides may or may not overlap in sequence. Preferably the peptides (or RNA or DNA encoding such peptides) to be included in the immunogenic composition do not induce an autoimmune response in the patient. Preferably the peptides in the immunogenic composition have a length of 18 amino acids or more (in case of RNA or DNA an equivalent length of the nucleotide sequence may be used, or longer sequences).

In another embodiment the method further comprises comparing at least one sequence or portion thereof from the set of sequences of the subject-specific tumor genome, exome or transcriptome with sequences obtained by complete, targeted or partial genome, exome, or transcriptome (e.g. nucleic acid) sequencing of a normal tissue sample obtained from the subject having cancer. Although not required in the method according to the current invention, this step does allow ensuring that the recognized neo antigen open reading frame is not comprised/expressed in healthy tissue of the patient thereby ensuring the identified neo open reading frame peptide or neoantigen (comprising a neo epitope) is unique for the tumor of the patient (as compared to its healthy tissue). The pNOPs listed in the accompanying Sequence Listing have been filtered for absence in the wild type ORFeome; i.e. the pNOPs are not found in wild type ORFeome.

In another embodiment there is provided for a method of treating cancer in a subject comprising administering to the subject (A) a subject-specific immunogenic composition prepared based on a neoantigen identified with the method described herein above. Preferably the composition comprises a therapeutic amount of one or a plurality of neoantigens identified with the method described herein above. The immunogenic composition may comprise antigenic peptides or DNA or RNA encoding such antigenic peptides or combinations thereof as identified with the method disclosed herein. The skilled person knows how to prepare, test and administer such immunogenic compositions and may use any method or system available in the art.

Also disclosed herein are compositions that comprise at least two or more peptides or neoantigens. In some embodiments the composition contains at least two distinct peptides. Preferably, the at least two distinct peptides are or are not derived from the same neo open reading frame peptide identified in the patient. By distinct polypeptides is meant that the peptide vary by length, amino acid sequence or both.

Also disclosed herein, the immunogenic composition or vaccine is capable of raising a specific T-cell response. The vaccine composition comprises mutant peptides and mutant polypeptides corresponding to tumor specific neoantigens identified by the methods described herein. A person skilled in the art can, when desired, select preferred peptides by testing, for example, the generation of T-cells in vitro as well as their efficiency and overall presence, the proliferation, affinity and expansion of certain T-cells for certain peptides, and the functionality of the T-cells, e.g. by analyzing the IFN-γ production or tumor killing by T-cells. However this is not required, given that NOPs are in their entirety foreign to the body and thus potentially highly antigenic.

In some embodiment the method of treatment above further comprises administering to the subject (B) an anti-CTLA agent, an anti-PD-1 agent, an anti-PD-L1 agent or a combination thereof, wherein (A) and (B) are administered simultaneously or sequentially.

The neoantigenic peptide or vaccine composition can be administered alone or in combination with other therapeutic agents. The therapeutic agent is for example, a chemotherapeutic agent, radiation, or immunotherapy. Any suitable therapeutic treatment for a particular, cancer may be administered. Examples of chemotherapeutic agents include, but are not limited to bleomycin, capecitabine, carboplatin, cisplatin, cyclophosphamide, docetaxel, doxorubicin, etoposide, interferon alpha, irinotecan, lansoprazole, levamisole, methotrexate, metoclopramide, mitomycin, omeprazole, ondansetron, paclitaxel, pilocarpine, rituxitnab, tamoxifen, taxol, trastuzumab, vinblastine, and vinorelbine tartrate.

The subject may, in some embodiments, be further administered an anti-immunosuppressive/immunostimulatory agent. For example, the subject is further administered an anti-CTLA antibody or anti-PD-1 or anti-PD-L1. Blockade of CTLA-4 or PD-L1 by antibodies can enhance the immune response to cancerous cells in the patient. In particular, CTLA-4 blockade has been shown effective when following a vaccination protocol.

The optimum amount of each peptide to be included in the vaccine composition and the optimum dosing regimen can be determined by one skilled in the art without undue experimentation. The composition may be prepared for injection of the peptide, DNA or RNA encoding the peptide. For example, doses of between 1 and 500 mg 50 μg and 1.5 mg, preferably 125 μg to 500 μg, of peptide or DNA may be given and will depend from the respective peptide or DNA. Other methods of administration of the immunogenic compositions are known to the skilled person.

The immunogenic composition may be prepared so that the selection, number and/or amount of peptides present in the composition is/are tissue, cancer, and/or patient-specific. The selection may be dependent on the specific type of cancer, the status of the disease, earlier treatment regimens, the immune status of the patient, and, HLA-haplotype of the patient. Furthermore, the vaccine can contain individualized components, according to personal needs of the particular patient. Examples include varying the amounts of peptides according to the expression of the related neoantigen in the particular patient.

In therapeutic applications, compositions are administered to a patient in an amount sufficient to elicit an effective CTL response to the tumor antigen and to cure or at least partially arrest symptoms and/or complications. An amount adequate to accomplish this is defined as “therapeutically effective dose.”

For therapeutic use, administration should preferably begin at or shortly after the detection or surgical removal of tumors. This is followed by boosting doses until at least symptoms are substantially abated and for a period thereafter. For that reason being able to provide the immunogenic composition off-the-shelf or in a short period of time is very important. Preferably, the immunogenic compositions are administered parenterally, e.g., intravenously, subcutaneously, intradermally, intramuscularly, or otherwise. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions, such as pH adjusting and buffering agents, tonicity adjusting agents, wetting agents and the like.

For therapeutic purposes, nucleic acids encoding the peptide and optionally one or more of the peptides described herein can also be administered to the patient. A number of methods are conveniently used to deliver the nucleic acids to the patient. For instance, the nucleic acid can be delivered directly, as “naked DNA”. The peptides and polypeptides can also be expressed by attenuated viral hosts, such as vaccinia or fowlpox. This approach involves the use of vaccinia virus as a vector to express nucleotide sequences that encode the peptide. Upon introduction into the subject the recombinant vaccinia virus expresses the immunogenic peptide, and thereby elicits a host CTL response. Vaccinia vectors and methods useful in immunization protocols are described in, e.g., U.S. Pat. No. 4,722,848. Another vector is BCG (Bacille Calmette Guerin) as described in Stover et al. (Nature 351:456-460 (1991)).

According to another aspect there is provided for a computerized system for identifying a match in the method as described herein the system comprising:

-   -   a first database configured to store information regarding at         least one sequence in a set of sequences obtained from a         subject-specific tumor genome, exome or transcriptome obtained         by complete, targeted or partial genome, exome, or transcriptome         sequencing;     -   a second database configured to store information regarding at         least 147 sequences selected from SEQ. ID. No. 1-1989,         preferably at least SEQ ID.NO 1-147;     -   a processor configured to perform instructions for identifying a         match by comparing at least one sequence in the set of sequences         from a subject-specific tumor genome, exome or transcriptome in         the first database with the sequences in the second database,     -   wherein a match is identified when they have a string in common         representative of at least 8, preferably at least amino acids,         to thereby identify a tumor-specific neoantigen encoded by a         tumor-specific frameshift mutation in genes of the subject         having cancer.

Embodiments, preferences and aspects already described herein above with respect to the method of identifying and the method of treatment likewise apply to the above embodiment.

The computerized system may, for example be a web based interface allowing uploading of sequence information of a patient from anywhere in the world, and in any suitable computer readable format for forming the first database, which is then compared with the second database to identify a match as disclosed herein. Based on the matches identified and in accordance with the disclosure herein an immunogenic composition may be prepared and that is specific for the patient. Such immunogenic composition may then be available to the patient within a short period of time, allowing early commencement of the therapeutic treatment.

Alternatively, the computerized software may be in the form a software that can be executed on a computer device.

Finally there is provided for an anti-CTLA agent, an anti-PD-1 agent, an anti-PD-L1 agent or a combination thereof for use in the treatment of cancer in a subject, wherein the treatment further comprises administering to the subject a subject-specific immunogenic composition comprising a therapeutic amount of a plurality of neoantigens identified with the method as disclosed herein, or DNA or RNA encoding such neoantigens, wherein the agent and the plurality of neoantigens, or DNA or RNA encoding such neoantigens are administered simultaneously or sequentially.

Preferences, particularities and considerations expressed herein in the context of any other embodiment likewise apply to the above embodiment.

Indeed, it will be understood that all details, embodiments and preferences discussed with respect to one aspect of embodiment of the invention is likewise applicable to any other aspect or embodiment of the invention and that there is therefore not need to detail all such details, embodiments and preferences for all aspect separately.

Having now generally described the invention, the same will be more readily understood through reference to the following examples which is provided by way of illustration and is not intended to be limiting of the present invention. Further aspects and embodiments will be apparent to those skilled in the art.

EXAMPLES

The present invention is illustrated by using the data from patient number TCGA-14-0817-01A-01D-1492-08 from the Cancer Genome Atlas TCGA. [https://portal.gdc.cancer.gov/]. The patient has had resection of a glioblastoma, a brain tumor. This TCGA provided (standard) output of the genome sequence contains a list of somatic mutations, relative to the patient's germline sequence. We take the subset of provided FS (frame shift) mutations, and compared this against the databases of predicted novel open reading frame peptides (pNOPs) as disclosed herein. Here it was compared to the complete list of possible out of frame open reading frames (SEQ ID NO: 1-1989), thus to 1989 pNOPs as represented in the Sequence Listing. We find a match in pNOP number pNOP587802 this is in the in gene TP53 (a well-known tumor suppressor gene), encountered 7 times in the TCGA reference cohort, plus a match in pNOP number pNOP4985, hit 9 times in the reference cohort, in gene SSPO (a spondin encoding gene). The two pNOPs are both represented in the sequence libraries as described herein, including in the sub-library (SEQ. ID. NO. 1-960) that accommodates to 20% of all TCGA cancer patients, and containing in total 960 pNOPs. For these particular NOPs, vaccine peptides of 20 amino acid length are pre-synthesized, and can be distributed in freeze dried form, off the shelve.

The frameshift in the p53 gene generates a 21 amino acid NOP: RWSGPSSASYPSGRKFACGVFG (SEQ. ID. NO. 584), where the 20-mer peptide WSGPSSASYPSGRKFACGVFG (comprised in SEQ. ID. NO. 584) is represented as vaccine.

The mutation in SSPO occurs in a relatively upstream region within pNOP4985, the patient has 174 consecutive amino acids in the new frame:

-   -   GPGSGLPPTLRQPGGVPRVKVTGRNCRAATQCVGQRCSAGRPGLPGPPAPKAALP         REGALAGAVVPDSAPALGIHPAQEMPPRRSPAAPLYAQCQASGVCGLPGPLAQPPV         MEASRHVGAAAPAWLQGTPRAQDPTVRPGTATRSPAQPSAQRTCCSAQQSSVTRR GVLALGYA         (SEQ. ID. NO. 333). Because the previous data had indicated that         this pNOP occurs quite often (approximately 1 in a 1000         cancers), we had generated freeze dried peptides based on a         tiling path: for this region we have 5 peptide vaccines, each         shifted 10 amino acids compared to the previous one, thus in         total covering the C-terminal 60 amino acids multiple times.         Thus the vaccine contains these 5 peptides (here mentioned from         C-terminus, to more upstream):

(all comprised in SEQ. ID. NO. 333) CCSAQQSSVTRRGVLALGYA, SPAQPSAQRTCCSAQQSSVT, PTVRPGTATRSPAQPSAQRT, WLQGTPRAQDPTVRPGTATR, and SRHVGAAAPAWLQGTPRAQD.

Next, a physician is informed within 24 hours after we received the genome sequence of the patient tumor that we can offer 6 peptides in freeze dried form for vaccination against two long antigens, plus their certificates of analysis according to GMP regulation, to arrive at the hospital pharmacy within 24 hours after order. These are completely personalized vaccines, based on the patient's tumor genome sequence. The amount of peptide shipped is sufficient for the initial weeks of vaccination steps, and more can be ordered.

Thus the time between submission of genome sequence and vaccination can in principle be as short as 24-48 hours (in practice possibly closer to a week). In other words for patients with tumors such as this patient (average life expectancy after diagnosis less than a year) no time is lost. In addition, since these are NOPs that are encountered more often, the costs of their synthesis, purification, sterilization and formulation has an enormous advantage in terms of scale compared to peptide vaccines that are synthesized de novo on a case by case basis, such that the costs for the patient/hospital/insurance can be much lower. Finally, where possible feedback is collected on treatment outcome, so that eventually we accumulate knowledge that one peptide happens to work (even) better than others, data which we can eventually make available. Note that with SNV-neoantigens (ever different) such a new body of outcome evidence will not be attainable.

Having now fully described this invention, it will be appreciated by those skilled in the art that the same can be performed within a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation.

All references cited herein, including journal articles or abstracts, published or corresponding patent applications, patents, or any other references, are entirely incorporated by reference herein, including all data, tables, figures, and text presented in the cited references. Additionally, the entire contents of the references cited within the references cited herein are also entirely incorporated by references.

Reference to known method steps, conventional methods steps, known methods or conventional methods is not in any way an admission that any aspect, description or embodiment of the present invention is disclosed, taught or suggested in the relevant art.

The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art (including the contents of the references cited herein), readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein.

It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance presented herein, in combination with the knowledge of one of ordinary skill in the art. 

1. (canceled)
 2. A method of identifying at least one neoantigen for preparing a subject-specific immunogenic composition the method comprising: a. performing complete, targeted or partial genome, exome, or transcriptome sequencing of at least one tumor sample obtained from a subject having cancer to obtain a set of sequences of the subject-specific tumor genome, exome or transcriptome; b. comparing at least one sequence or portion thereof from the set of sequences with a database of nucleotide and/or polypeptide sequences, wherein the sequences in the database are represented by at least SEQ ID.NO 1-147; c. identifying a match between the at least one sequence or portion thereof from the set of sequences and a sequence in the database when they have a string in common representative of at least 8, preferably at least 10 amino acids to thereby identify a tumor-specific neoantigen encoded by a tumor-specific frameshift mutation in genes of the subject having cancer; and optionally d. repeating steps b.-c. for a further sequence or portion thereof, from the set of sequences of the subject-specific tumor genome, exome or transcriptome.
 3. The method of claim 2, wherein the sequences in the database are identified by at least SEQ ID.NO 1-960, or SEQ ID.NO 1-1989.
 4. The method of claim 2, wherein a plurality of neoantigens for preparing a subject-specific immunogenic composition are identified.
 5. The method of claim 2, wherein the neoantigen is about 8-10 amino acids in length, greater than 10 amino acids in length, greater than 15 amino acids in length, greater than 30 amino acids in length, or greater than 50 amino acids in length.
 6. The method of claim 2, wherein the method further comprises comparing at least one sequence or portion thereof from the set of sequences of the subject-specific tumor genome, exome or transcriptome with sequences obtained by complete, targeted or partial genome, exome, or transcriptome sequencing of a normal tissue sample obtained from the subject having cancer.
 7. A method of treating cancer in a subject comprising administering (A) a subject-specific immunogenic composition prepared based on a neoantigen identified with the method of claim
 2. 8. The method of treating cancer of claim 7, further comprising administering to the subject (B) an anti-CTLA agent, an anti-PD-1 agent, an anti-PD-L1 agent or a combination thereof, wherein (A) and (B) are administered simultaneously or sequentially.
 9. The method of treating cancer of claim 7, further comprising one or more immune-stimulating adjuvant(s), including but not limited to liposomes, Montanide ISA51, Hiltonol, CAF09, Amplivant, Resiquimod, Iscomatrix, CpG, poly-ICLC, aluminium salts, monophosphoryl lipid A and/or squalene.
 10. The method of treating cancer of claim 7, wherein the plurality of neoantigens comprises at least 4, 8 or 12 neoantigens.
 11. A computerized system for identifying a match in the method of claim 2, the system comprising: a first database configured to store information regarding at least one sequence in a set of sequences obtained from a subject-specific tumor genome, exome or transcriptome obtained by complete, targeted or partial genome, exome, or transcriptome sequencing; a second database configured to store information regarding at least 147 sequences selected from SEQ. ID. No. 1-1989, preferably at least SEQ ID.NO 1-147; a processor configured to perform instructions for identifying a match by comparing at least one sequence in the set of sequences from a subject-specific tumor genome, exome or transcriptome in the first database with the sequences in the second database, wherein a match is identified when they have a string in common representative of at least 8, preferably at least amino acids, to thereby identify a tumor-specific neoantigen encoded by a tumor-specific frameshift mutation in genes of the subject having cancer.
 12. (canceled)
 13. A method of identifying at least one neoantigen for preparing a subject-specific immunogenic composition the method comprising: a. performing complete, targeted or partial genome, exome, or transcriptome sequencing of at least one tumor sample obtained from a subject having cancer to obtain a set of sequences of the subject-specific tumor genome, exome or transcriptome; b. determining the presence or absence of a match between the set of sequences from the tumor sample and each of SEQ ID.NO 1-147; wherein a match between sequences is present when sequences have in common a string of at least 8, preferably at least 10 amino acids, and whereby the presence of a match identifies a tumor-specific neoantigen encoded by a tumor-specific frameshift mutation in genes of the subject having cancer. 